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We consider several ways to test for topology directly in harmonic space by comparing the mea- 
sured ai m with the expected correlation matrices. Two tests are of a frequentist nature while we 
compute the Bayesian evidence as the third test. Using correlation matrices for cubic and slab-space 
tori, we study how these tests behave as a function of the minimal scale probed and as a function of 
the size of the universe. We also apply them to different first-year WMAP CMB maps and confirm 
that the universe is compatible with being infinitely big for the cases considered. We argue that 
there is an information theoretical limit (given by the Kullback-Leibler divergence) on the size of 
the topologies that can be detected. 
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I. INTRODUCTION 

General Relativity has been extremely successful in de- 
scribing the large-scale features of our universe. But the 
global shape of space-time is a quantity that is not de- 
termined by the local equations of General Relativity. 
An intriguing possibility is therefore that our universe is 
much smaller than the size of the particle horizon today. 

In the standard model, the universe is described by 
a Friedmann-Lemaitre-Robertson- Walker (FLRW) type 
metric which is homogeneous and isotropic. If the topol- 
ogy of the universe is not trivial, then we are dealing with 
a quotient space X/T where X is one of the usual simply 
connected FLRW spaces (spherical, Euclidean or hyper- 
bolic) and r is a discrete and fixed-point free symme- 
try group that describes the topology. This construction 
does not affect local physics but changes the boundary 
conditions (see eg. 0,13 and references therein). 

This could potentially explain some of the anomalies 
found in the first-year WMAP data. For example, the 
perturbations of the cosmic fluids need to be invariant 
under T. Therefore the largest wavelength of the fluctua- 
tions in the CMB cannot exceed the size of the universe, 
and so the suppression (and maybe the strange align- 
ment) of the lowest CMB multipoles might be due to a 
non-trivial topology 0, H H IS IS 13 ■ Additionally, the 
last scattering surface can wrap around the universe. In 
this case we receive CMB photons, which originated at 
the same physical location on the last scattering surface, 
from different directions. Observationally this would ap- 
pear as matched (correlated) circles in the CMB 0]- An 
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analysis by Cornish et al of the first-year WMAP maps 
based on a search for matching circles has not found any 
evidence for a non-trivial topology 01 • However, it is 
difficult to quantify the probability of missing matching 
circles, and other groups have claimed a tentative detec- 
tion of circles at scales not probed by Cornish et al (see 
e -g- 0)^3)- In this paper we study a different approach 
which can in principle yield both an optimal test as well 
as a rigorous assessment of the fundamental detection 
power of the CMB for a cosmic topology. 

Instead of working directly with the observed map of 
CMB temperature fluctuations, we expand the map in 
terms of spherical harmonics, 

T(x) = 22a em Yi m (x), (1) 

where x are the pixels. Both the pixels and the expan- 
sion coefficients a,£ m are random variables. In the sim- 
plest models of the early universe, they are to a good ap- 
proximation Gaussian random variables, an assumption 
that we will make throughout this paper. Their n-point 
correlation functions are then completely determined by 
the two-point correlation function. The homogeneity and 
isotropy of the simply-connected FLRW universe addi- 
tionally requires the two-point correlation of the a,£ m to 
be diagonal, 

(a^ m a|, m /) = CiSu'Smm' ■ (2) 

The symmetry group T will introduce preferred direc- 
tions, which will break global isotropy. This in turn in- 
duces correlations between off-diagonal elements of the 
two-point correlation matrix. In this paper we study 
methods to find such off-diagonal correlations. Such 
a test is complementary to the matched-circle test of 
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|9j, llO| , and if the initial fluctuations are Gaussian then it 
can use all the information present in the CMB maps and 
so lead to optimal constraints on the size of the universe. 
Investigating the amount of information introduced into 
the two-point correlation matrix by a given topology al- 
lows us to decide from an information theoretical stand- 
point whether the CMB will ever be able to constrain 
that topology. 

We will use the following notation: We often combine 
the £ and to indices to a single index s = £(£ + 1) + m 
and mix both notations frequently. The noisy correlation 
matrix given by the data is A SS ' = a s a*/. We will write 
the correlation matrix which defines a given topology as 
B ss > . This is the expectation value of the two-point cor- 
relation function for a s that describe a universe with that 
topology. 

All the simulations in this paper are based on a flat 
ACDM model with fl\ = 0.7, a Hubble parameter of h = 
0.67, a Harrison-Zel'dovich initial power spectrum (ns = 
1) and a baryon density of il^h 2 = 0.019, as described 
in 0, ^| . With this choice of cosmological parameters 
we find a Hubble radius 1/Ho 4.8Gpc while the radius 
of the particle horizon is Rh ~ 15.6Gpc. We will denote 
a toroidal topology as T[X,Y,Z] where X, Y and Z are 
the sizes of the fundamental domains, in units of the 
Hubble radius. As an example, T [4,4,4] is a cubic torus 
of size (19.3Gpc) 3 . The volume of such a torus is nearly 
half that of the observable universe. The diameter of the 
particle horizon is about 6.5/ Hq- But we should note 
that there are non-zero off-diagonal terms in B ss i even 
for universes that are slightly larger than the particle 
horizon. 

We have a range of correlation matrices at our dis- 
posal so far. Two of them are cubic tori with sizes 2 /Ho 
(T[2,2,2]) and A/H Q (T[4,4,4]). For these two we have the 
correlation matrices up to £ max — 60 (corresponding to 
Smax = 3720). We also have two families of slab spaces. 
The first one, T[X,X,1], has one very small direction of 
size 1/Hq. The second one, T[15,15,X], has two large di- 
rections that are effectively infinite. Both groups include 
all tori with X = 1,2,..., 15, and we know their corre- 
lations matrices up to f max = 16 (or s max = 288). The 
correlation matrices analysed in this paper do not contain 
the integrated Sachs- Wolfe contributions (cf discussion in 
section IVIira . 

This paper is organised as follows: We start out by 
matching the measured correlations to a given correlation 
matrix. We then show that a similar power to distinguish 
between different correlation matrices can be achieved by 
using the likelihood. In general we do not know the rel- 
ative orientation of the map and the correlation matrix, 
and we discuss how to deal with this issue next. We then 
present a first set of results from this analysis, before 
embarking on a simplified analysis of the WMAP CMB 
data and toroidal topologies. 

So far the methods were all of a frequentist nature. 
Using the likelihood we can also study the evidence for a 
given topology, which is the Bayesian approach to model 



selection. We then talk about the issues that we ne- 
glected in this paper, and finish with conclusions. 

The appendices look in more detail at how the cor- 
relation and the likelihood method differ, and how their 
underlying structure can be used to define "optimal" esti- 
mators. We also discuss how selecting an extremum over 
all orientations can be linked to extreme value distribu- 
tions, which allows us to derive probability distribution 
functions that can be fitted to the data for quantifying 
confidence levels. We finally consider a distance func- 
tion on covariance matrices, motivated by the Bayesian 
evidence discussion, and study its application to the com- 
parison between different topologies. 

II. DETECTING CORRELATIONS 

A priori it is very simple to check whether there are sig- 
nificant off-diagonal terms present in the two-point corre- 
lation matrix: One just looks at terms with I ^ £' and/or 
to ^ to'. But the variance of the ai m is too large as we 
can observe only a single universe. When computing the 
Ce we average over all directions to. This averaging then 
leads to a cosmic variance that behaves like 1 / \fi. But 
now we have to consider each element of the correlation 
matrix separately, leading to a cosmic variance of order 
1 for each element. The matrix is therefore very noisy 
and we need to "dig out" the topological signal from the 
noise. Furthermore, if we detect the presence of signifi- 
cant off-diagonal correlations, we still need to verify that 
they are due to a non-trivial topology and not to some 
other mechanism that breaks isotropy. 

A natural approach to the problem is then to use the 
expected correlation matrix for a given topology as a kind 
of filter. To this end we compute a correlation amplitude 
A which describes how close two matrices are. We do this 
by minimising 

X 2 [A]=^(A S '-A6 SS 2 (3) 

ss' 

where „4 SS / = a s a*, is the correlation matrix estimated 
from the data and B ss ' the one which contains the topol- 
ogy that we want to test. For a good fit we expect to 
find Awl while for a bad fit A ~ 0. 

We can easily solve d\ 2 /dX — and find that 

\ _ ^2ss> -Ass/Ess' . . 

minimises Eq. As we know that we will have to 

compare our method against maps from an infinite uni- 
verse with the same power spectrum, we do not sum over 
the diagonal s = s' (which corresponds to I = £' and 
to = m') to improve the signal to noise. This corresponds 
to replacing the correlation matrix through B — > B — T> 
where I? is a diagonal matrix with the power spectrum on 
the diagonal. If the power spectrum is constant so that 
T> = C x 1 then the eigenvectors of the new correlation 
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matrix are the same as those of the original one, and the 
eigenvalues are replaced by £™ — * eW _ Q. In this case 
they will no longer be positive. 

We could also introduce a covariance matrix in Eq. ^ . 
In the presence of noise this may be useful. In this 
study we will assume throughout an idealised noise-free 
and full-sky experiment for simplicity At any rate the 
WMAP data will be cosmic variance dominated at the 
low £ that we consider here, see section IVIII Al Ne- 
glecting the noise contribution, the covariance matrix is 
C qq > = (B q B q i) where q = {s,s'}. But as the correlation 
matrices are already expectation values, we end up with 
a matrix that has a single non-zero eigenvalue e = J2 q &q- 
If we invert this singular matrix with the singular value 
decomposition (SVD) method (setting the inverse of the 
zero eigenvalues to zero) and minimise the resulting ex- 
pression for the x 2 , we find again Eq. (0J. 

It is straightforward to compute the expectation value 
and variance of the A function for two important cases. In 
the first case the universe is infinite, so that the spherical 
harmonics ai m are characterised by the usual two-point 
function, 



{atmO-l' m >) oo — CeSu'S n 



(5) 



In the second case the universe has indeed the topology 
described by the correlation matrix B against which we 
test the ci£ m . In this case the two point function of the 
spherical harmonics is given by 



(a£ m a|, TO ,)g — B s 



(6) 



In both cases the spherical harmonics obey a Gaussian 
statistics and the higher n-point functions are uniquely 
determined by the two-point function via Wicks theorem. 

Let us first define the auto-correlation U — 
J2 S s' \Bss> | 2 ■ We remind the reader that such sums in this 
section exclude the diagonal terms s = s' except where 
specifically mentioned. For an infinite universe, we no- 
tice that if we sum only over the non-diagonal elements 
s s' then, since (a s a*/)oo = C s S ss i the expectation 
value of lambda is zero, (A)oo = 0. Else, 



<A)c 



tv{B)/U. 

For a finite universe, 



independently if we sum over the diagonal elements or 
not, as we just recover the auto-correlation in the nu- 
merator. Of course the auto-correlation value depends 
on the summation convention. 

For the variance, in the case of an infinite universe, we 
find 



(A 2 



2 x - 



- Woo -Jj2£^ 



IB., 



(9) 



The summation depends again if we keep the diagonal 
elements or not. For a whitened map, the result simplifies 
to a 2 ^ = 2/U. In a finite universe, 



U' 



r tr (BB*BB*) , 



(10) 



however now we need to be more careful if we discard the 
diagonal elements, as then 
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(£ s l s2 #* 2s3 # s 3 s 4S* 4sl ) 



(11) 



Table [H shows the expectation values of variances for a 
selection of topologies, computed with these formulas. It 
may be surprising that the variance of A for an infinite 
universe depends on the test-topology. However, Eq. (0J 
depends on B even if the ag m do not. The variance is a 
measure of how different B is from the diagonal "correla- 
tion matrix" of an infinite universe, Eq. (J3J). The larger 
the difference, the smaller the variance of A, as the ran- 
dom off-diagonal correlations present in the a^ m are less 
likely to match those of the test-matrix B. The value 
of (max in the table was chosen basically arbitrarily, we 
will discuss later how it influences the measurements. We 
have also introduced a "signal to noise ratio" S/N which 
is the difference of the expectation values, divided by the 
errors added in quadrature, 



S/N(B,X) = 



\(X) 00 -(X) B \ 



(12) 



Here X is the estimator used. This gives only a rough 
indication of the true statistical significance with which 
a universe with the given topology can be distinguished 
from an infinite universe. As the distribution of A and \ 2 
are not exactly Gaussian, S/N is not exactly measured 
in units of standard deviations. However, it is sufficient 
to compare the different methods and to illustrate how 
well different topologies can be detected. For precise sta- 
tistical results we fit the full distribution, see appendix 

El 



(7) 


topology 


-(-max 




As 


S/N [a] 




T[2,2,2] 


60 


0± 0.017 


1± 0.102 


9.7 




T[4,4,4] 


60 


± 0.046 


1 ± 0.082 


10.6 




T[2,2,2] 


16 


0±0.03 


1 ± 0.34 


2.9 




T[4,4,4] 


16 


0±0.09 


1±0.22 


4.2 


(8) 


T[6,6,l] 


16 


0±0.08 


1±0.33 


2.9 


T[15,15,6] 


16 


0±0.51 


1±0.59 


1.3 



TABLE I: Comparison of the mean and standard deviation of 
A for different topologies and different ^ ma x, normalised with 
the true power spectrum. The S /N value is given by Eq. itHl . 



The power spectrum Ci depends of course on the cos- 
mological parameters. To minimise this potential prob- 
lem we normalise the correlation matrices either by the 
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diagonal C s = (a s a* s ) or by the usual orientation- aver- 
aged power spectrum 



C't 



via the prescription 
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B s 



V C S C S 



(13) 



(14) 



This is often called "whitening" , and it serves to enforce 
the same (white noise) power spectrum in both the tem- 
plate and the model being tested. After applying this 
normalisation the power spectrum is just C s = 1. We 
apply the same normalisation to the ag m . As we will not 
in general know their "true" input power spectrum, we 
use the one recovered from the ag m themselves. As can 
be seen in table [n] the division by the recovered power 
spectrum greatly reduces the variance of A and so im- 
proves the detection power for the different topologies. 
Contrary to table [I] we could not compute the numbers 
analytically and have estimated them from 10 4 random 
realisations each of maps with the trivial topology and 
the B topology. 



topology 


^max 




As 


S/N [a] 


T[2,2,2] 


60 


0± 0.015 


0.973 ± 0.030 


29.0 


T[4,4,4] 


60 


0± 0.051 


0.976 ± 0.044 


14.5 


T[2,2,2] 


16 


± 0.032 


0.924 ±0.100 


8.8 


T[4,4,4] 


16 


0± 0.091 


0.948 ±0.100 


7.0 


T[6,6,l] 


16 


± 0.083 


0.894 ± 0.200 


4.1 


T[15,15,6] 


16 


0± 0.534 0.971 ±0.553 


1.3 



TABLE II: Comparison of the mean and standard deviation 
of A for different topologies and different ^ ma x, normalised 
with the power spectrum estimated independently for each 
realisation. As we see, the signal to noise ratio is improved 
considerably. 

For an infinite universe C s is independent of m and 
it does not matter whether we divide by C s or Cg. For 
non-trivial topologies this is not the case as additional 
correlations are induced in different m modes. For this 
reason, the division by the m-averaged Cg tends to lead 
to somewhat stronger constraints. Of course we lose the 
information encoded in the power spectrum, like the sup- 
pression of fluctuations with wavelengths larger than the 
size of the universe. However, we feel that the improved 
stability to mis-estimates of the power spectrum and the 
reduced dependence on the cosmological parameters is 
worth the trade-off. 

The numerical evaluation of Eq. Q requires a double 
sum over s max = l max (f m ax + 2) matrix coefficients. It 
scales therefore as But the correlation matrix of 

an infinite universe is diagonal, so that we only need to 
perform a single sum. It should therefore be possible 
to reduce the work for matrices that are close to being 



diagonal, ie. for universes with a very large compactifica- 
tion scale. A possibility is to decompose the correlation 
matrix into a sum over eigenvalues and eigenvectors. We 
can then only retain the most important eigenvectors. As 
the correlation matrix is also a covariance matrix, this is 
somewhat analogous to principal component analysis or 
the Karhunen-Loeve transform. For a correlation matrix 
B we will write the decomposition as 



(15) 



The ew are the eigenvalues of the matrix B and they 
are real and positive as the matrix is hermitian and posi- 
tive. This allows us to define effective spherical harmon- 
ics b s = V e^v^, which have, for example, the same 
properties under rotation as the usual ag m - 



III. USING THE LIKELIHOOD 

Instead of considering the correlation between the re- 
covered and the theoretical matrix, we can think of the 
two-point correlation matrix as the covariance matrix of 
the ag m . Then we may ask the question, what is the 
probability of a covariance matrix C given the measured 
ag m . This can be answered using Bayesian statistics. 

In a first step we need to construct the likelihood func- 
tion. The probability distribution for a Gaussian random 
variable x with variance a 2 and zero expectation value is 



p(x\a) = 



1 



27TCT 



(16) 



If we assume that we measure x and want to know <r, then 
the likelihood function for finding a certain x is given by 
C(a) = p(x\a). We write the likelihood as a function of 
the variance, as this is the model parameter that we are 
interested in. 

For many independent variables, the probability dis- 
tribution is the product, which leads to a sum in the 
exponent. In the case of the ag m , the random variables 
are not independent but are distributed according to a 
multivariate Gaussian distribution with a covariance ma- 
trix C. The likelihood function then is 



p(a£ m \C) = C(C) oc 



1 



exp 



s,s' j 



(17) 

where \C\ is the determinant of the matrix C. The covari- 
ance matrix is given by the two-point correlation matrix, 
and (a s ) — 0. Any further model assumptions are im- 
plicitly included in the choice of C. Using Bayes law we 
can invert the probability to find 



p{C\a lrn ) = 



p(ag m \C)p(C) 
p{ag m ) 



(18) 



The probability in the denominator is a normalisation 
constant, while p(C) is the prior probability of a given 
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topology encoded by C. We will assume that we have no 
prior information about the topology of the universe, so 
that this is a constant as well. In this case p(C\ai m ) cx 
C{C), ie. we can use the likelihood function to estimate 
the probability of a topology given a set of ai m . For 
our purpose, the covariance matrix is just given by the 
correlation matrix B. In general, one may have to add 
noise to it, and maybe introduce a sky cut. 

Generally it is preferable to consider the logarithm of 
the likelihood, log(£) = -l/2(log(|B|) + x 2 ) + const, 
where we have defined 

X 2 = J2 a * B ss'^'- (19) 

S,s' 

We notice that there is a potential issue with the nor- 
malisation of the input model: If a s — ► then X 2 — > 
- generally any model whose a s lead to a bad fit (high 
X 2 ) could be renormalised until a reasonable likelihood is 
obtained. It is therefore required to fix the overall nor- 
malisation, and we will do this by using the whitened a s , 
in which case the normalisation is fixed by J2 S \ a s\ 2 = 1- 

For the two special cases, the infinite universe and ag m 
distributed according to £>, we can compute expectation 
value and variance. For the general case we will write 
(a s a s >) = Ass'- Then 

( X 2 )=^(a>y)B-i=tr(^B- 1 ) ! (20) 

ss' 

where we have used the hermeticity of the correlation 
matrices. The two special cases are 

(X 2 >oo = Y, G ' B » ( 21 ) 

S 

(X 2 )e = tr(l) = w (22) 

As the aim are Gaussian random variables, we expect 
to find that X 2 is distributed with a x 2 -hke distribution. 
The general expression is rather cumbersome, but for the 
two special cases we find 

al = (( X 2 ) 2 )b - (X 2 >l - 2 W (23) 

and 

al=2Y,C s C sl \B; s }\ 2 . (24) 

ss' 

We list in table II I II some examples, together with the 
number of standard deviations that the two expectation 
values lie apart. 

In these computations, as in the corresponding ones for 
the correlation coefficient, we have assumed that we nor- 
malise the observed ai m by the "true" power spectrum 
(or diagonal). However, we do not know what it is. If we 
instead normalise them by the estimated one (which is 
different for each realisation), we change the statistics. It 
is now no longer Gaussian. Tabic Hvl reproduces the pre- 
vious one, but now for this scenario. We estimated the 



topology 




Xoo 


2 

Xb 


S/N [a] 


T[2,2,2] 


60 


37168 ± 2373 


3720 ± 86 


14.1 


T[4,4,4] 


60 


14656 ± 1517 3720 ± 86 


7.2 


T[2,2,2] 


16 


5608 ± 738 


288 ± 24 


7.2 


T[4,4,4] 


16 


1802 ± 300 


288 ± 24 


5.0 


T[6,6,l] 


16 


20781 ± 7103 


288 ± 24 


2.9 


T[15,15,6] 


16 


309 ± 28 


288 ± 24 


0.6 


TABLE III: Same as table for X 2 ■ 


topology 


( 

*-max 


Xoo 


xl 


S/N [a] 


T[2,2,2] 


60 


37366 ±1123 


4655 ± 438 


27.1 


T[4,4,4] 


60 


14932 ± 1157 4027 ± 162 


9.3 


T[2,2,2] 


16 


5690 ± 477 


474 ± 131 


10.5 


T[4,4,4] 


16 


1841 ± 196 


335 ± 48 


7.5 


T[6,6,l] 


16 


21093 ± 5645 


786 ± 557 


3.6 


T[15,15,6] 


16 


309 ± 10 


289 ±5 


1.8 



TABLE IV: Same as table El for X 2 - 



numbers from 10 4 numerical realisations for each topol- 
ogy. Again the detection power increases considerably. 

In appendix lAl we compare the structure of the corre- 
lation estimator to the likelihood x 2 - We find that for 
many cases the X 2 h as minimal variance. 

IV. ROTATING THE MAP INTO POSITION 

The situation discussed so far is somewhat misleading: 
Nature is rather unlikely to align the topology of the 
universe with our coordinate system. The correlation 
matrices are not invariant under rotations, as rotations 
mix ag m with different m. To parametrise the rotations 
we use the three Euler angles a, (3 and 7 which describe 
three subsequent rotations around the z, the y and again 
the z axis. The first and last rotation just lead to a phase 
change. The rotation around the y-axis couples different 
m and is given by Wigner rotation matrices d mm , , 

a im - ^^'^dLm' WWm' • (25) 

m ' 

Together, the three rotations can describe any element 
of the rotation group of order £. We use the relations 
given in to compute the rotation matrices. Figure Q] 
shows an example where we plot A while rotating the ai m 
azimuthally. The figure represents the case for ^ max = 
60, for lower values of ^ ma x the peaks are less sharp and 
there is less sub-structure. The same is true for the x 2 j 
while the peaks for likelihood, which is proportional to 
exp(— x 2 /2), are even much narrower. 

We can therefore not avoid probing all possible ro- 
tations, either by computing the average or by taking 
the maximum/minimum of our estimator over all orien- 
tations. Possibly the most straightforward approach is to 
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3 4 

rotation angle 



FIG. 1: Behaviour of the correlation coefficient A under a 
rotation around the z-axis. The signal is maximal only for 
very well-defined alignments. We used a T[2,2,2] correlation 
matrix and ae m derived from a T[2,2,2] topology. 



try many random rotations |l6j . This is simple to pro- 
gram and uses automatically any symmetries present in 
the template. But due to the precision needed to find 
the best alignment for some templates, we found that we 
need in excess of 10 6 rotations to get correct results for 
f ma x = 60. We can on the other hand probe systemati- 
cally all orientations, for example with the total convolu- 
tion method . In this approach, the rotations with the 
three Euler angles are replaced by a three-dimensional 
FFT. This speeds the procedure up a by a large factor. 
However, we found that we may nonetheless miss the 
best- fit peaks which can be very sharp (see Figs.n an d 
0). 




-0.062 o.^a 



FIG. 2: The maximal correlation coefficient for the case of 
a universe with T[2,2,2] orientation. The sharp, high peaks 
correspond to the correct orientation of the map with respect 
to the template. 

If we limit ourselves to finding the maximum/minimum 
efficiently, then we can also start with a random rotation 



and search for a local extremum nearby. We then repeat 
the procedure for different random starting locations un- 
til we have found a stable global maximum (for exam- 
ple, eight times the same global maximum). This is the 
safest method, and can be relatively fast depending on 
the topology. 

Computing the average is therefore quite difficult and 
slow. We also found that using the maximum or mini- 
mum results in a much stronger detection than using the 
average, at least for the A and \ 2 estimator. It is possible 
to improve the average by using the likelihood which is 
proportional to exp(— % 2 /2). This decreases the weight 
of the "wrong" orientations exponentially. However, it 
makes the average even harder to compute. Furthermore, 
it lends itself readily to a Bayesian interpretation which 
is quite different from the frequentist approach followed 
so far. For this reason we will consider only the maxi- 
mum/minimum approach here and defer the discussion 
of the average likelihood to section IVIII We also note 
that it makes no difference if we consider the x 2 esti- 
mator or the likelihood when using the extremum over 
orientations. The exponential function is monotonic and 
so the maximum or minimum point will not change un- 
der it (except that the minimum of the \ 2 will turn into 
a maximum of the likelihood and vice versa). For the 
same reason, it does not change the statistical weight. If 
99 realisations of model A have a lower x 2 than any of 
model £>, then those 99 realisations will have a higher 
likelihood as well. 

A drawback of using the extremum over all rotations 
is that we do not know the resulting distribution func- 
tion. In general we have to compute a large number 
of test-cases to obtain the distribution, but this is very 
time-consuming and for high ^ max computing more than 
a few hundred realisations becomes prohibitive, at least 
on a single processor. Instead we can find a good ap- 
proximation to the new distribution by assuming that 
each rotation leads to a new independent Gaussian dis- 
tribution. If there are N independent rotations then we 
need to know the distribution of the maximal value of 
N draws from a Gaussian distribution. This leads to an 
extreme value distribution, and exact results are known 
only for N < 6. However, for very large N, the dis- 
tribution should converge to one of three limiting cases, 
analogously to the central limit theorem (see eg. [lg)- If 
we fit these distributions to the numerical results then we 
can obtain confidence limits with a reasonable amount of 
cpu-time. We discuss this in more detail in appendix [BJ 

We compare in tables [S and IVII the minimal \ 2 and 
maximal A values respectively, taken over all possible ori- 
entations. We also quote the resulting S/N value. We no- 
tice that especially the % 2 estimator gains in sensitivity. 
This seems rather surprising, as the distance between the 
estimator values of an infinite and a finite universe will 
in general decrease when taking the extremum. How- 
ever, we also notice that the variance is dramatically de- 
creased, which in turn leads to the even higher detection 
power. 
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topology 


^max 


2 
Xoo 


2 

Xb 


S/N [a] 


T[2,2,2] 


60 


33237 ± 586 


4588 ± 382 


41 


T[4,4,4] 


60 


11146 ±438 


4057 ± 204 


14 


T[2,2,2] 


16 


4062 ±172 


469 ±172 


17 


T[4,4,4] 


16 


1180 ± 73 


350 ± 47 


10 


T[6,6,l] 


16 


7719 ± 1125 


675 ± 370 


6 


T[15,15,6] 


16 


287 ±2.1 


285 ± 2.5 


0.6 



TABLE V: Comparison of the mean and standard deviation of 
the x 2 f° r different topologies and different ^ max , normalised 
with the power spectrum and minimised over rotations. 



The reduction of the variance, especially for the infinite 
universe case, is easy to understand. In table ITTl and II VI 
we use the best-fit alignment for the maps of a finite uni- 
verse. But the maps with the trivial topology are always 
randomly aligned (being statistically isotropic). The 
variance for the infinite universe maps contains therefore 
an effective "random orientation" contribution. Taking 
the extremum over all orientations eliminates this con- 
tribution. As the infinite universe variance dominates 
strongly in the case of the \ 2 estimator, we find that 
this estimator benefits more from the reduction of the 
variance. 



topology 


*-max 




As 


S/N [a] 


T[2,2,2] 


60 


0.08 ±0.01 


0.98 ±0.03 


28 


T[4,4,4] 


60 


0.21 ±0.02 


0.98 ±0.05 


14 


T[2,2,2] 


16 


0.16 ±0.02 


0.95 ±0.08 


10 


T[4,4,4] 


16 


0.38 ±0.05 


0.98 ±0.09 


6 


T[6,6,l] 


16 


0.35 ±0.05 


0.94 ±0.19 


3 


T[15,15,6] 


16 


1.84 ±0.25 


1.86 ±0.27 






TABLE VI: Same as table El for A. 

As a final point, we notice that the maximised value of 
A for the T[15,15,6] topology in table IVll is larger than 1. 
This is a sign that we cannot detect that topology. The 
fluctuations are so large that they completely overwhelm 
the signal. After maximising over orientations we end up 
with A > 1. 



V. DISCUSSION OF GENERAL RESULTS 



A. What angular resolution is required? 



age over directions. To do that we only need to invert the 
matrix once at the start, not for every evaluation. But 
we need to evaluate the likelihood for each orientation, 
and the number of the required rotations scales roughly 
like ^max- We therefore end up with a ^„ lax scaling at 
any rate. Secondly, the most time consuming procedure 
is the estimation of the variance using simulated maps, 
and again we only need to invert the matrix once as it 
stays the same. £^ ax is a rather steep growth, and it 
is certainly preferable to use the smallest matrices that 
guarantee a detection. 

On the other hand, does the detection always improve 
with growing £ max ? Let us have a look at the correla- 
tion estimator, in the case of a whitened map. Clearly 
(T^, = 2/U can only decrease as long as there are any 
off-diagonal elements in the correlation matrix. But this 
is not the dominant error. However, we expect that the 
main contribution to Eq. 111|) is due to the remaining di- 
agonal entries S2 = S3 and si — S4. This term of the sum 
is equal to the auto-correlation U and so contributes the 
same error as er^ . As the signature of the topology be- 
comes very weak, we expect that the two errors become 
comparable, but are still decreasing functions of £ max . 
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FIG. 3: Detection significance assuming that we know the 
correct orientation. The topologies were T[2,2,2] (solid black 
and dotted red line) and T[4,4,4] (dashed blue and dot-dashed 
magenta line). The estimators were respectively the correla- 
tion amplitude A (dotted red and dot-dashed magenta line) 
and the likelihood \ 2 (solid black and dashed blue line). 



Is it better to test the maps to arbitrarily high £ max , or 
to use a lower resolution? One important consideration is 
the amount of work (and thus of time) needed to evaluate 
the estimator. For both estimators we need to sum over 
s and s' . This means that the required number of opera- 
tions scales like ^ax- The matrix inversion required for 
the likelihood evaluation scales like ^ ax . However, for 
two reasons it is normally not the limiting factor. Firstly, 
as discussed in the previous section, we still need to aver- 



We compare in Figs. and 01 the scaling of 
S/N(T[2,2,2\) and S/N (T[4, 4, 4]) respectively, for the 
correlation estimator (red dotted / magenta dash-dotted) 
and the likelihood method (black solid / blue dashed). 
In all cases we used 100 realisations to compute the av- 
erage and standard deviation, which explains the noisy 
curves. As discussed earlier, we find that taking the ex- 
tremum over rotations can increase the detection power, 
especially for the x 2 estimator. 
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FIG. 4: Detection significance when maximising over all ori- 
entations. The topologies were T[2,2,2] (solid black and dot- 
ted red line) and T[4,4,4] (dashed blue and dot-dashed ma- 
genta line). The estimators were respectively the correlation 
amplitude A (dotted red and dot-dashed magenta line) and 
the likelihood \ 2 (solid black and dashed blue line). 



We also see that for the T[4,4,4] topology and the cor- 
rect orientation, the correlation method eventually over- 
takes the likelihood method. This is most likely because 
the T[4,4,4] correlation matrix is closer to being diago- 
nal than the T[2,2,2] correlation matrix. At high £ the 
diagonal elements start to dominate the contributions to 
the x 2 - The correlator method is not sensitive to this 
contribution as it does not sum over the diagonal ele- 
ments. After maximising over orientations, on the other 
hand, the likelihood is always superior to the correlation 
method, except maybe for the highest £ max . 

We further notice that the detection power keeps in- 
creasing with increasing £ max , even though things tend to 
slow down beyond £ fts 40. This means that it is useful to 
consider the largest £ for which we have the correlation 
matrix and which we can analyse in a reasonable amount 
of time. Unfortunately, it is also the case (and hardly 
surprising) that the smallest universes profit the most 
from analysing smaller scales. The traces from large but 
finite universes become rapidly weaker as £ max increases. 
As there is little practical difference between a 20 a de- 
tection and a 50 a detection, it seems in general quite 
sufficient to consider scales up to £ max = 40 to 60. The 
higher £ may become more important when we also con- 
sider the ISW effect. 



B. What size of the universe can be detected? 

From the suppression of the \ow-£ modes in the angu- 
lar power spectrum, the T[4,4,4] topology seems a good 
candidate for the global shape of the universe. Can we 



constrain it with one of our methods? Tables IVII and 
IVI show that we can indeed distinguish a universe with 
T[4,4,4] topology from an infinite one at over 10 a. 

As in the previous section we plot in Figs. |S] and HO the 

detection significance both before and after maximising 
over directions. This time we study two families of slab 
spaces. The first one, T[X,X,1], has one very small direc- 
tion of size 1 / Hq and we vary the other two. We find that 
we can clearly detect this kind of topology at £ max = 16 
for any size of the larger dimensions. For this example- 
topology it is very striking how the correlation estimator 
is better if we use the "correct" alignment, while the x 2 
becomes more powerful as we extremise over orientations. 




5 10 
size of universe (X) 

FIG. 5: Detection significance assuming that we know the 
correct orientation. The topologies were T[X,X,1] (solid black 
and dotted red line) and T[15,15,X] (dashed blue and dot- 
dashed magenta line). The estimators were respectively the 
correlation amplitude A (dotted red and dot-dashed magenta 
line) and the likelihood \ 2 (solid black and dashed blue line). 
We used ^ max = 16. 



The second family, T[15,15,X] is considerably harder to 
detect as here two directions are very large and effectively 
infinite. For large values of X we cannot find a difference 
to an infinite universe. As the third direction shrinks, we 
start to see differences, but only for X < 3 /Hq can we 
detect the non-trivial topology at over 2 a. In this case 
the correlation method is always inferior to the \ 2 - I n ap- 
pendix El we consider a more fundamental distance mea- 
sure between correlation matrices, namely the Kullback- 
Leibler divergence. We confirm that we will never be able 
to distinguish T[15,15,X] with X > 6/Ho from an infinite 
universe, see also Fig. 1151 This is not very surprising, as 
in this case the universe is in all directions larger than 
the particle horizon today. 
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FIG. 6: Detection significance when maximising over all ori- 
entations. The topologies were T[X,X,1] (solid black and dot- 
ted red line) and T[15,15,X] (dashed blue and dot-dashed ma- 
genta line). The estimators were respectively the correlation 
amplitude A (dotted red and dot-dashed magenta line) and 
the likelihood x 2 (solid black and dashed blue line). Again 

^max — 16. 



VI. A SIMPLIFIED ANALYSIS OF WMAP 
DATA 

To illustrate the application of these tests to real data, 
we perform a simplified analysis of the WMAP [l^ data. 
Simplified in the sense that we do not deal with issues 
like map noise and sky cuts. In general, one has to sim- 
ulate a large number of maps where both of these effects 
are included, and which are then analysed with the same 
pipeline as the actual data map. However, as an illus- 
tration we will analyse reconstructed full-sky maps. We 
use the internal linear combination (ILC) map created 
by the WMAP team, which we will call the WMAP map 
from now on. We also use two map reconstructions by by 
Tegmark, a Wiener filtered map (TW) and a foreground- 
cleaned map (TC) [2(j. All of these maps are publicly 
available in HEALPix format |2l| with a resolution of 
N s ide = 512. We use this software package to read the 
map files and to convert them into ae rn . 

To get some idea of the systematic errors in this anal- 
ysis, we additionally analyse the ILC map reconstructed 
by Eriksen et al. (LILC). They also produced a set of 
simulated LILC maps (for the trivial topology) with the 
same pipeline [2^, . It is a necessary (but not suffi- 
cient) condition to trust our simplified analysis that the 
results from these maps are consistent with our results 
for an infinite universe. As an illustration we plot in 
Fig. [7] the distribution of x 2 for our simple infinite uni- 
verse maps (black solid histogram) and for the simulated 
ILC maps which contain noise and foreground contri- 
butions (red dashed histogram). We see that the two 



distributions agree quite well, to within their own vari- 
ance. The variance observed between the different re- 
constructed sky maps (WMAP, TC, TW and LILC) is of 
the same order of magnitude. This example is for T[2,2,2] 
and f max = 16, but it is representative of the other cases. 
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FIG. 7: The distribution of the (x 2 )oo estimator values when 
testing for a T[2,2,2] universe with i m&x = 16. The black 
solid histogram is computed from 10000 noiseless full-sky re- 
alisations used throughout this paper, while the red dashed 
histogram used 1000 simulated LILC maps (see text). The 
vertical lines show the x 2 values of the measured maps, from 
the left LILC, TW and WMAP (coincident) and TC. 

For our standard example, the T[4,4,4] template, we 
find a maximal value for the 1st year WMAP ILC map 
of A max = 0.20. This is about expected for an infinite 
universe. A universe exhibiting a genuine T[4,4,4] topol- 
ogy should lead to roughly A max = 1. 



topology £ max 


X 2 Poo Pb 


A Poo P B 


T[2,2,2] 60 
T[4,4,4] 60 
T[6,6,l] 16 
T[15,15,6] 16 


33130 0.39 
11020 0.40 
8805 0.85 10~ 6 
290 0.95 0.01 


0.087 0.20 
0.20 0.64 
0.37 0.29 10~ 5 
1.6 0.16 0.84 



TABLE VII: The value of x 2 and A obtained for the WMAP 
map, together with the probability of measuring such a value 
if the universe is infinite (Poo) and if the universe has indeed 
the topology that we test for (Pg). 

We give in table IVIII the values of \ 2 an d A for the 
WMAP map. The values for the other maps are not 
very different. We also give two probabilities for both 
estimators, Poo and Pg. The first one is the probability 
of measuring a larger value of A (or a smaller value of 
X 2 ) if the universe is infinite. P& on the other hand is 
the probability of measuring a smaller value of A (or a 
larger value of % 2 if the universe has indeed the topology 
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that we tested for. For a non-detection of any topology 
we require to be not too small. A positive detection 
of a topology on the other hand requires a larger Pg. If 
both probabilities are large then we cannot detect that 
topology (as exemplified eg. for the case of T[15,15,6]). 
We compute these probabilities with the best-fitting the- 
oretical PDF, as discussed in the appendix [BJ 
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FIG. 9: The same as Fig.Elfor the T[15,15,X] topology. Again 
all WMAP maps are consistent with an infinite universe, but 
we can only rule out the universes with X < 3 at more than 
95% CL. 
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FIG. 8: Median and 95% confidence limits as measured with 
the x 2 estimator for infinite universes (upper green limits) 
and universes with a T[X,X,1] topology (red lower limits), as a 
function of size X in units 1/Hq. We also plot the \ 2 values of 
the WMAP map (red crosses), the TW map (cyan triangles) 
and TC map (blue circles) and the LILC map (magenta stars). 
All sky maps are consistent with an infinite universe and not 
consistent with a T[X,X,1] topology for any X. We also plot 
errorbars for the LILC map simulations. 

Fig. |H| shows 95% confidence limits (estimated numer- 
ically from 10 4 samples) when testing for the presence 
(red, lower band) or absence (green, upper band) of a 
T[X,X,1] topology. The WMAP data (points) are all 
compatible with the infinite universe and rule out this 
kind of topology very strongly. The bounds from the 
simulated LILC maps (black error bars) are consistent 
with our simulated maps with a trivial topology, but sys- 
tematically a bit lower. We plot the same in Fig. EI for a 
T[15,15,X] topology. Again, WMAP is compatible with 
the infinite universe. But as discussed before, we cannot 
detect these universes for X > 3/Hq. Overall, all results 
are consistent with an infinite universe. 



VII. BAYESIAN MODEL SELECTION 

The likelihood can also be used in a purely Bayesian 
approach. We are interested in the probability of a model 
given the data, p(M\d). If all topologies are taken to 
be equally probable, then through Bayes theorem the 
statistical evidence £ (M) for a model is proportional to 



the probability of that model, given the data. Using the 
three Euler angles as parameters 9, defining the model 
Ai to be a given topology, and the data d the measured 
ai m we can write the model evidence as 

£{M) <xp{d\M) = J n(e)ir(Q)£(Q), (26) 

where tt(Q) is the prior on the orientation of the map, 
see eg. [24j. The ratio of the evidence for two topologies 
is a Bayesian measure of the relative probability. We 
can think of it as the relative odds of the two topologies. 
A similar method to constrain the topology was applied 
previously to the COBE data, see |2q. 

The measure /x(9) on SO(3) needs to be independent 
of the orientation|30j. which pretty much singles out the 
Haar measure (up to an irrelevant constant). In terms 
of the Euler angles it is dad(3d^ sin(/3)/(87r 2 ) with a and 
7 going from to 27r, and /3 from to n. The volume 
of SO(3) is then J fi(Q) = 1. A simple way to generate 
random orientations is to select a and 7 uniformly in 
[0, 2tt] and u in [—1,1] and then set /3 — arccos(w). 

The advantage of using Bayesian evidence is that it 
provides a natural probabilistic interpretation which de- 
pends only on the actually observed data, but not on 
simulated data sets. Because of this, there is no need to 
run large comparison sets. This is a very different view 
point from the frequentist approach followed so far. 

For an infinite universe the correlation matrix is diag- 
onal and rotationally invariant (due to isotropy). The 
integral over the alignment becomes trivial in this case. 
If we use whitening then the correlation matrix is just 
the unit matrix and we have 

X 2 = ^k| 2 =s max - (27) 
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The second equality is due to the whitening. The likeli- 
hood is then 

^ = C ^- 1,2x2[e) = const / 2 , (28) 

where the constant normalisation is independent of the 
topology. We will neglect it as it drops out when com- 
paring the evidence for different models. This "infinite" 
evidence gives us a reference point, with our choice of 
measure on SO (3) and of normalisation it is 

-log(£ 00 )= S -^, (29) 

On the other hand, if the universe is infinite then we 
know that the expected x 2 1S the trace of the inverse 
of the correlation matrix that we test for. It is again 
rotationally invariant as (a s a*,) is rotationally invariant. 
The log-evidence is on average 

-log(£)-i(tr(S- 1 )+log|6|). (30) 

We notice that the expected log-evidence difference to the 
true infinite universe is the Kullback-Leibler divergence, 

Alog(£) = D KL {l\\B) = l - (\og\B\ + tv{B- 1 - 1)) 

(31) 

We should not forget though that this is a very crude ap- 
proximation to the evidence. Nonetheless, Eq. (|31|l gives 
a useful indication of the odds that we can detect a given 
topology, as it can be evaluated very rapidly, without 
performing the integration over orientations. Fundamen- 
tally, this is the amount of additional information about 
topology contained in the correlation matrix B. If the 
amount of information is not sufficient to distinguish it 
from an infinite universe, no test will ever be able to tell 
the two apart. We discuss the Kullback-Leibler (KL) di- 
vergence and its possible applications in more detail in 
appendix |0 

Of course, faced with real data we have to evaluate the 
actual evidence integral. Unfortunately the likelihood is 
extremely strongly peaked around the correct alignments 
(especially for a non-trivial topology), and it is very diffi- 
cult to sample from it. Already the A and x 2 estimators 
require a very precise alignment to reach the true max- 
imum or minimum. Exponentiating — x 2 leads to much 
narrower peaks in the extrema, and makes the problem 
far worse. In Fig. ^| we plot the relative likelihood (nor- 
malised to unity at the peak) for a universe with T [4,4,4] 
topology close to a correct alignment (the vertical line), 
and for different Anax- The broadest peak corresponds 
to f max = 16, and we added the location of 10 4 points 
evenly spaced between and 2ir as black crosses. This 
corresponds to a total of roughly 10 11 points to cover 
all of SO(3). For £ max = 16 we could get away with 
using only every 10th point (about 10 8 points in total) 
and still detect the high-likelihood region. But not so 
for £ max = 32 and 60 (the narrower peaks), which would 
easily be missed. 



This renders methods like thermodynamic integration 
infeasible. On the other hand, we are dealing only 
with three parameters. Direct integration is therefore 
marginally possible by using an adaptive algorithm. For 
l mm = 16 we need to start out with at least 10 6 points in 
order to detect the high-probability regions at all. This 
means that we have to count on 10 7 to 10 8 likelihood 
evaluations. The situation gets worse for higher reso- 
lution maps, as both the likelihood evaluations require 
more time and the high-probability regions shrink. We 
therefore only quote results for £ max — 8 in this section. 




i i i i i i i i i i i i i i i i i i i i i 

1.55 1.56 1.57 1.58 1.59 



FIG. 10: Relative likelihood for a T[4,4,4] topology around 
one of the symmetry points where a simulated T[4,4,4] map 
aligns correctly. The broadest (black) curve is for ^ max = 16, 
the intermediate (red) curve ^ max = 32 and the narrowest 
(blue) curve £ max = 60. The vertical green line lies at (j> = 
7r/2. The crosses show the location of 10 4 points between 
cj> = and <j) — 2n. 



topology 


^max 


WMAP 


TC 


TW 


LILC 


Dkl(1\\B) 


00 


8 


-17 


-17 


-17 


-17 





T[2,2,2] 


8 


-114 


-103 


-100 


-102 


172 


T[4,4,4] 


8 


-46 


-41 


-47 


-44 


64 


T[6,6,l] 


8 


-526 


— oo 


— oo 


— oo 


1733 


T[15,15,6] 


8 


-17 


-18 


-18 


-17 


1 



TABLE VIII: Lhe log-evidence log 10 (£) for a range of topolo- 
gies and data maps (see text). We also quote the KL diver- 
gence with respect to an infinite universe for comparison. 



As the data sets which define our likelihood we use 
the same four maps as in the frequentist analysis: The 
ILC map by the WMAP team (WMAP), two maps by 
Tegmark et al, the Wiener filtered map (TW) and the 
foreground- cleaned map (TC) and the ILC map by Erik- 
sen et al. (LILC). We quote the logarithm (to base 10) of 
the evidence in t able IVTTT1 for our usual range of example 
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models. The relevant quantity for model comparison is 
the difference of these values (corresponding to the ratio 
of the probability). If the log evidence of a model A is 
3 higher than the log evidence of model B, we conclude 
that the odds for model A are 10 3 times better. This can 
be seen as fairly good odds in favour of model A. We plot 
in Fig. ll4l the correspondence between the logarithm of a 
probability ratio and the number of standard deviations 
(a) for a Gaussian random variable. 

All topologies except T[15,15,6] are excluded at high 
confidence. The evidence values for the different recon- 
structed CMB maps agree at least qualitatively. We plot 
in Fig.^Jthe evidence of the T[15,15,X] cases as a func- 
tion of X. The two smallest universes are strongly ex- 
cluded, X — 2 could be excluded if we used a higher res- 
olution, and the rest are too close to the infinite universe 
to be constrained. We also plot the mean and standard 
deviation of the simulated LILC maps as error bars. The 
T[X,X,1] cases are all so completely excluded that the 
integral is just barely feasible given the huge numbers 
involved. 

We would like to remind the reader that the results in 
this section are always relative to the observed map. It is 
therefore a bit worrying that the evidences differ by sev- 
eral orders of magnitude when we consider the different 
full-sky reconstructions. We also checked the stability 
of the results for 1000 simulation of the LILC map with 
known (trivial) topology. We found it to be rather poor 
(cf the large error bars in fig lll|) . although this may be 
partially due to the smaller range of I. Another possible 
source for this lack of stability is our simplistic likelihood. 
The Bayesian interpretation of the results is only true if 
we are able to derive the correct likelihood. This is an 
important difference to the frequentist results where we 
calibrate the statistical interpretation with the compar- 
ison sets. In the frequentist scenario, we may end up 
with a sub-optimal test, but we will not get wrong re- 
sults if we use the wrong likelihood function. Not so in 
the Bayesian case, which forces us to be more careful. A 
possible way out is to reconstruct a likelihood from the 
set of simulated LILC maps. 

Normally, a difference of 2 to 3 in log 10 (£) is taken 
to be sufficient to strongly disfavour a model against an- 
other one. This may be reasonable for a full analysis that 
takes into account all the issues discussed in the following 
section. For full-sky reconstructed maps we feel that we 
should require at least a difference of 10. Overall it seems 
that the frequentist approach leads to results which are 
more stable against the uncertainties introduced by the 
full-sky reconstruction and foreground removal. 



VIII. COSMIC COMPLICATIONS 

This paper aims at introducing and discussing the dif- 
ferent methods for constraining the topology of the uni- 
verse in harmonic space. In doing so we study an ide- 
alised situation with perfect data, neglecting several is- 
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FIG. 11: The evidence of a T[15,15,X] topology with ^ ma x = 8 
for four different full-sky reconstructions of the WMAP data 
(WMAP red crosses, TW cyan triangle, TC blue circle and 
LILC magenta stars). The black error bars are derived from 
simulated LILC maps. They are consistent with the actual 
LILC data map. The green line shows the predicted evidence 
of an infinite universe. 



sues that are present in the real world. Here we give 
a quick overview over the main complications that will 
have to be dealt with for a rigorous analysis. Clearly 
they will change the quantitative results presented here, 
but we do not expect that they will lead to qualitative 
changes in the results. 



A. Noise 

If we assume constant and independent per-pixel noise 
<7/v then the covariance matrix of the ai m acquires an 
additional diagonal term, 



' N u ss' ■ 



(32) 



This is fairly close to what many CMB experiments 
(like WMAP and Planck) expect for their data. The 
CMB power spectrum on large scales behaves roughly 
like l/£ 2 (Harrison-Zel'dovich) with a power of about 
Cio ~ 60/iK 2 . For any experiment that probes scales 
beyond the first peak, we can conclude that the large 
scales (£ < 100 say) are completely signal dominated. 
Taking WMAP as an example, we see that Fig. 1 of |2rj| 
gives a noise contribution to the Gi of 0.1 to 0.6/xK 2 de- 
pending on the assembly. As the noise additionally (to 
first order) does not enter in the off-diagonal terms, we 
can safely neglect it for a first analysis. 

More generally we expect a fixed noise variance per de- 
tector and per observation. The resulting per-pixel noise 
is (Jn(x) — cro/\/^obs- Turning again to WMAP as an 
example, we find that they cite a noise variance <7q w 2 — 7 
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mK. Expressed in terms of the spherical harmonic coef- 
ficients, the correlation matrix in this scenario becomes 

(a* s a s ,) + al J < fxN^ B (x)Y s *(x)Y s/ (x) (33) 

where the integration runs over all pixels x. Because of 
its spatial variation, the noise is no longer confined to 
the diagonal and should strictly speaking be taken into 
account. But the off-diagonal terms will still be very 
small. The most straightforward way to include the noise 
is to simulate maps with the correct power spectrum and 
noise properties and to co-add them. This is especially 
the case when we deal with a complicated sky cut (see 
below) . 

The ILC maps that we used here have more compli- 
cated noise properties due to the full-sky reconstruction. 
But the noise itself will still be negligible on large scales, 
compared to the signal. More worrying are potential 
foreground contaminations that were not completely sub- 
tracted. We explore that problem partially in section IVfl 
by using simulated LILC maps. 

B. Uncertainties in the cosmological parameters 

So far we have used correlation matrices computed for 
a fixed cosmological model. But there are still significant 
uncertainties present in the true value of the cosmologi- 
cal parameters, and even in the underlying cosmological 
model. An example was recently discussed in |2jj . In 
principle we have to take such uncertainties into account. 
For the Bayesian model selection approach, we could do it 
straight-forwardly by marginalising over them. Of course 
this may mean computing a large number of correlation 
matrices for different cosmological models, which would 
lead to a computational challenge. Alternatively, one 
should consider a selection of models and incorporate the 
variance of the correlations into a systematic error on the 
correlation matrices. 

In practise, we hope that the whitening which elim- 
inates differences in the power spectrum will also min- 
imise the effects due to this parameter uncertainty. At 
the very least it will do so for the "infinite universe" tests 
where no off-diagonal correlations are present. The re- 
sult that the full-sky WMAP maps are compatible with 
an infinite universe is thus not affected by the parameter 
uncertainty. 

C. The integrated Sachs- Wolfe effect 

An issue somewhat related to the last point is that 
not all perturbations are generated on the last scattering 
surface. Some of them are due to the integrated Sachs- 
Wolfe (ISW) effect. Especially perturbations due to the 
late ISW effect that are generated relatively close to us 
are then not affected by the global topology and carry 
no information about it. They act as a kind of noise for 



our purposes. This contribution is especially problematic 
when searching for matching circles in pixel space. It is 
readily included when working with the correlation ma- 
trices, even though it will also be subject to the param- 
eter uncertainties and it will lower our detection power 
substantially. 

The rapid decrease of the late ISW effect with increas- 
ing I provides an additional incentive for probing smaller 
scales, I w 40 - 60. 

D. Sky cuts 

Here we have only considered full-sky maps. Unfortu- 
nately a large part of the sky-sphere is covered by our 
galaxy which leads to foregrounds that are not easy to 
subtract and obscure the true CMB signal. The most 
conservative approach is therefore to remove a part of 
the sky via a sky cut. This amounts to introducing a 
mask M.(x) in pixel space, with value 1 on the pixels x 
where the CMB signal is clean, and in the contaminated 
parts of the sky. We then consider the pseudo-d£ m 

aim = J d 2 xM{x)ST(x)Y em (x) (34) 

instead of the true ag m . We can perform the masking 
operation directly in harmonic space, using the spherical 
Fourier transform of the mask, 

M ss . = J d 2 xM(x)Y s (x)Y*(x). (35) 

The relation between the true ai rn and the observed 
pseudo-a£ m is then given by d s = J2 S ' ■M SS 'a s '- Unfor- 
tunately the mask matrix M. corresponds to a loss of in- 
formation and can in general not be inverted. We could 
of course use SVD to invert it, and eliminate the small 
SVD eigenvalues. However, this would be quite similar to 
a full-sky reconstruction. Instead, it may be preferable 
to apply the sky cut to the correlation matrix as well. 
The resulting pseudo-correlation matrix is then 

B = M t BM. (36) 

This leads to two problems. The first one is purely 
computational: The sky cut has a fixed orientation (with 
respect to the o^ m ). So far it did not matter if we ro- 
tated the correlation matrix or the ai m , as only the rel- 
ative orientation counted. But since the sky cut defines 
an absolute orientation we now need to apply the rota- 
tion to the correlation matrix. Rotating the correlation 
matrix is considerably more costly than rotating the ob- 
served a£ m . The use of the eigenvector decomposition 
(|15|l and rotation of the effective spherical harmonics be m 
can somewhat alleviate the situation if only a few eigen- 
values dominate the sum. 

The second problem is that a sky cut and its associ- 
ated mask matrix introduce just the kind of correlations 
between different ae m that we are looking for. A sky cut 
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will impact significantly on our ability to constrain large 
universes. We will have to either accept this limitation, 
or hope that better full-sky reconstruction and compo- 
nent separation methods (for example j2^) will become 
available. However, one would have to demonstrate that 
such methods do indeed not change the correlation prop- 
erties of the ai m hi a way that influences the detection of 
a topology-signature. At the very least one has to con- 
sider such effects as systematic errors and include them 
in the error budget of a full analysis. 



CMB temperature data sets, although some improve- 
ment may come from better foreground separation with 
more frequencies, and from e.g. using polarisation maps 
in addition to the temperature maps. Short of waiting 
a few billion years for the universe to expand further, 
these tests and especially the information theoretical lim- 
its provided by the Kullback-Leibler divergence give us 
an idea about what we can learn of the shape of our 
universe. 



IX. CONCLUSIONS AND OUTLOOK 

In this paper we have studied three ways to constrain 
the topology of our universe directly with the correla- 
tion matrix of the ag m . If the primordial fluctuations are 
Gaussian then these correlation matrices contain all the 
information about the global shape of our universe that 
is carried by the CMB. By trying to find their traces in 
the measured ai m we can construct the most sensitive 
probes possible. 

We studied two frequentist estimators, A which de- 
scribes the correlation amplitude between the theoret- 
ical correlation matrix B and the measured 0£ m , and 
X 2 = <vB~ x a. Although A has certain advantages at high 
i by leaving out the diagonal terms, we found the x 2 to be 
generally superior after taking into account the random 
orientation of the observed map. We also computed the 
Bayesian evidence, which we found to be a very sensitive 
probe. But the angular integration is computationally 
very intensive, especially at high resolutions. Addition- 
ally, much care is needed in constructing the likelihood 
function. For these reasons, the % 2 minimised over rota- 
tions seems the most useful of our tests. 

For our scenario we find that even high multipoles, 
I > 50, still carry important information about the topol- 
ogy. However, the amount of work needed to extract the 
information scales as a high power of I. For most cases 
£ « 30 — 40 seems a sufficient upper limit. 

We finally apply our methods to a set of reconstructed 
full-sky maps based on WMAP data. For all topologies 
considered (cubic and slab tori) we find no hints of a non- 
trivial topology. Based on the exclusion of the T[4,4,4] 
topology, we conclude that the fundamental domain is 
at last 19.3Gpc long if it is cubic. We rule out (not 
very surprisingly) any universe where a fundamental do- 
main in any direction is smaller than 4.8Gpc (based on 
the T[X,X,1] cases). If the universe is infinite in two di- 
rections, then the third direction has to be larger than 
14.4Gpc. These limits still allow two copies of the uni- 
verse inside the current particle horizon. We prefer to 
understand this analysis as a demonstration of our meth- 
ods, as we neglected a range of important issues such as 
the IS W effect. 

The noise of the WMAP data is already cosmic vari- 
ance dominated on the scales of interest. Future exper- 
iments will not be able to provide significantly better 
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APPENDIX A: FINDING AN OPTIMAL 
ESTIMATOR 

It is interesting to compare the expressions for the x 2 
and the A estimator. The philosophy of the two ap- 
proaches is very different. In the first case we write down 
a likelihood function for a given covariance matrix. In 
the second case we correlate the noisy estimated covari- 
ance matrix with a theoretical model. We then use the 
correlation amplitude A as a measure of goodness-of-fit. 
To compare the two methods, we use the eigen-space ex- 
pansion l|15l) . As B is hermitian, the eigenvalues are real; 
if we use the full covariance matrix, which is positive 
definite, the eigenvalues are also positive. 

Introducing this expansion into the expression for x 2 
we find 



M) 



(Al) 



To compute the same for the correlation amplitude, we 
use that the eigenvectors are normalised and orthogonal, 
J2 S V ^ V ^* = The autocorrelation is then simply 
J2ss' |S SS /| 2 = J2i e ^ 2 an d the correlation amplitude is 

2 



A = 



(0 



(A2) 



If one eigenvalue dominates, then the two expressions 
coincide. If all eigenvalues are equal, then x 2 — s m axA. 
This happens for an infinite universe if we normalise it by 
the power spectrum. In both cases the statistical prop- 
erties are equal. 

In the intermediate cases we see that both correspond 
to a different weighting of the correlations between the 
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eigenvectors and the ag m . The question arising now is 
whether we can determine an optimal weighting that 
leads to the smallest variance if the ag m are drawn ei- 
ther from an infinite universe or from one with covariance 
matrix B. If the two requirements are not the same, it 
is preferable to optimise with respect to the infinite uni- 
verse, as large universes will be close to this case. 

Let us, as an example and guided by the above discus- 
sion, postulate a general estimator 



(A3) 



where we use the structure that we observed above, 

2 



Xi 



(A4) 



The expectation value and the variance of the general 
estimator are 

(a) = £>«<Xi) ( A5 ) 

a 2 ee <I 2 > - (a) 2 

= ^a^a^' ((XiXj) — (Xi)(Xj)) (A6) 



This is a linear system which can be solved via matrix 
inversion. For the simplest case where Aij = CiSij (the 
observed ai m are also drawn from an infinite universe 
with power spectrum Ci) we can write down the solution 
up to a normalisation constant: 



(A12) 



We assume that both the template and sky have the same 
power spectrum d . In our case this also means that the 
eigenvalues of A are e'*) — Ci. The minimum variance 
estimator is therefore proportional to the x 2 . On the 
other hand, after whitening C,- = 1 and both estimators 
become equivalent. 

It is also easy to consider the case when the a s are 
distributed according to the same correlation matrix B 
that we compare them with. As the eigenvectors are 
orthonormal, we find that 



The variance of the estimator is then 



(A13) 
(A14) 

(A15) 



The aim is to find the that minimise the variance 
of the estimator, subject to a normalisation constraint. 
We are going to consider several different limits. The 
simplest example is the case where the eigenvectors are 
due to an infinite universe, in which case Vs = fe- 
lt is now easy to see that (Xi) — (a,a|) = Ci and 
(XiXj) = CiCj + 2\Aij\ 2 where Aij is the covariance 
matrix from which the observed ag m are drawn. The ex- 
pectation value and variance of the general estimator are 
now 



This is minimised by 



= 2£« 



(i) a U) 



\A^ 



(A7) 



(A8) 



Adding the constraint ^ = s max with a Lagrange 
multiplier I we have to minimise the expression 



a « a (j) 



I A,- 1 2 + * ^> {i) - wj 



(A9) 



The relevant system of equations is found as usual by 
computing the first derivatives with respect to I and the 
coefficients and setting them to zero to find the extrema: 

5> W = W (A10) 

i 

/ + 4^a«L4 lfc | 2 = Vfc = l,...,w (All) 



CKi (X 



(A16) 



as before. The \ 2 estimator has therefore the minimal 
variance in this case as well. 

However, we see from tables lITll and Hvl that the domi- 
nant error is not ()g but ()oo- We should therefore try to 
minimise this variance instead. Here the a,£ m are those 
of an infinite universe while the eigenvectors are those 
of the correlation matrix B. It is possible to derive an 
optimal estimator for this case, but it is rather unwieldy. 

Finally, our aim is to maximise the detection of a given 
topology. This is not necessarily the same as minimising 
the variance as discussed above. Firstly, the discussion 
necessarily disregards the deviations introduced by divid- 
ing the a£ m by their own power spectrum. Secondly, the 
A correlation estimator gains power by leaving out the 
diagonal terms. And thirdly, we use the extremum over 
all orientations which will also change the results. 



APPENDIX B: EXTREME VALUE 
DISTRIBUTIONS 

Computing the extrema of the estimators for a large 
number of cases takes a lot of cpu time. It is important 
to use this information efficiently, for example by fitting 
a theoretically motivated distribution function. We try 
to derive such a fitting distribution by considering all ro- 
tations as independent random realisations. We then use 
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the maximum or minimum, depending on the estimator. 
This is known as extreme value statistics [l^. For ex- 
ample in the case of A we found that its distribution is 
nearly Gaussian. We find with our approximations for 
the distribution of the maximum out of n draws 



The cumulative distribution function (CDF) is 



C n (z) = p[max(Ai,...,A„) < z] 
= p(Ai <z,...,X n <z) 

n 
i=l 

= C(z) n . 



(Bl) 
(B2) 

(B3) 

(B4) 



Here C{z) = (1 + crf(v / 2z))/2 is the cumulative prob- 
ability function of a single univariate Gaussian random 
variable and C n (z) the same for the maximum of n in- 
dependent univariate Gaussians. The median lies at 
C n (z) = 1/2 or C(z) = 2" 1 /™. We show in figure E2 
the location of the median as a function of n. For the 
relevant number of independent rotations, we find a shift 
of 4 to 6 sigma. 
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FIG. 12: The location of the median value in number of stan- 
dard deviations a for the maximum value out of n Gaussian 
random variables. 



A theorem similar to the central limit theorem says 
that there are certain limiting distributions to which the 
distribution of an extremal value converges. The limiting 
distribution for an unbounded variable like A is the Gum- 
bel distribution, with a probability distribution function 
(PDF) of the form 



P(x) 



exp( 



- exp(— z))/a where z = (x — fi) /a 

(B5) 

(see e.g. [23 for a discussion and another astrophysical 
application). The expectation value is 0-7 + ^ and the 
variance is cr 2 7r 2 /6 where 7 « 0.577 is the Euler constant. 
We can use these two values to find a and [i given the 
variance and expectation value of the distribution. 



F(x) 



— e ~ exp(-2 



(B6) 



We can consider e.g. F(x — 0.95) as two-sigma upper 
limit. We find that for N of the order of a few thousand, 
5(7 is a very conservative upper bound. Even though the 
extreme value distribution moves the expectation values 
up or down, the variances around those values can still 
remain surprisingly small. The signal to noise ratio need 
not decrease because of the shift. Indeed, as discussed in 
[V] we find that it often even increases. 

For a bounded variable like a x 2 the situation is similar, 
except that the limiting distribution is now called Weibull 
distribution, with 

The two parameters a and 7 can be fixed again by mea- 
suring the expectation value /i = aF[l + 1/7] and vari- 
ance a 1 = a 2 (r[l + 2/7] - T[l + I/7] 2 ) of the numerical 
distribution. The CDF is simply 

F{x) = l-e- (t/Q) \ (B8) 

and x > 0. 

However, we found that this form is a bad fit even to 
just the minimum over independent variables with a true 
X 2 type distribution. It seems better to allow for two 
different exponents, leading to a PDF of the form 



P(x) 



7 



(iM- (;)'}• 



aT[(l+/3)/ 7 ] 

We call this the extended Weibull distribution. The CDF 



F(x) = 1 - 



r[(l+/?)/ 7 ,(a:/a)T] 



(BIO) 



where T[a, b] is the incomplete Gamma function, with 
T[a,0] = T[a] and r[l,ar] = exp(— x). We recover the 
standard case for /3 = 7 — 1. We found the extended 
Weibull distribution to be the best-fitting distribution in 
general, even for the A estimator. Figure 1131 shows an 
example fit to the PDF of A for a T[6,6,l] distribution. 

There are different ways to fit the theoretical extreme 
value distribution to the numerical CDF. We could for 
example maximise the Kolmogorov-Smirnov probability. 
Instead we decided to use a Bayesian approach: We con- 
sider the numerical values as "data points" di for the 
true CDF and use the theoretical distribution as a model 
with parameters 0j. For each data point the probability 
is then given by p(di\9) — F(di). As all the data points 
are independent, we can define a likelihood function C as 



X 2 = -21og(£(0)) 
= -2Iog^nF(di) 

= -2£>g(F(di)) 



(BH) 
(B12) 

(B13) 
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We can then easily compute the posterior probability of 
the parameters 9 that describe the distribution with a 
Markov-chain Monte Carlo method. 




X (T[6,6,l]) 



FIG. 13: The PDF of A for a T[6,6,l] topology maximised 
over rotations (black histogram, 10 4 samples) and the best 
fit using an extended Weibull distribution (red curve). The 
Kolmogorov-Smirnov probability of the fit is 42%. 



APPENDIX C: A DISTANCE BETWEEN 
TOPOLOGIES 

1. The Kullback-Leibler divergence 

Let us consider the following question: What is the ex- 
pectation value of the ratio of the likelihoods for covari- 
ance matrices A and B if the a s are distributed according 
to a correlation matrix A? We have already computed 
the log-likelihood in section ITTT1 the first case is simply 



(log/:) 

and the second one 



(logL4|+tr(l)) 



(CI) 



(log£) = --(log|S|+tr(^- 1 )) (C2) 

The difference between the two expressions is the loga- 
rithm of the likelihood ratio, 

(A log C) = - \ (log \A\ - log |B| + tr(l - B^A)) 

(C3) 

This is precisely the Kullback-Leibler divergence between 
the two Gaussian distributions described by A and B. 

The Kullback-Leibler (KL) divergence is in general de- 
fined for two probability distributions p and q as 



D KL (p\\q) = J p\og 



topology 


^max 


trfjr 1 ) 


log(|S|) Dkl(1\\B) 


oc 


16 


288 








T[2,2,2] 


16 


4661 


-486 


1944 


T[4,4,4] 


16 


1570 


-192 


545 


T[15,15,6 


16 


309 


-8 


6 


T[6,6,l] 


16 


20781 


-399 


10047 



TABLE IX: Some key quantities for computing the KL di- 
vergence. The whitening enforces tr(£>) = s m ax(= 



Notice that this is not symmetric, so the symmetrised 
form D(p||g) + D(q\\p) is sometimes used if it is not 
clear which distribution is the fundamental one. In in- 
formation theory the KL divergence describes the rela- 
tive entropy (or information) between the two probabil- 
ity distributions p and q. This corresponds for example 
to the amount of information wasted when trying to de- 
scribe data distributed as q with a model based on p (see 
e.g. 0). _ 

We consider the KL divergence for random variables 
x which have a normal distribution with zero mean and 
covariance matrix A, 



p(A,x) = {2kT' 2 \A\ 



-1/2 



exp 



(C5) 



We can derive an expression for the KL divergence di- 
rectly in terms of the covariance matrices by evaluating 
the Gaussian integrals: 



p{A) loj 



p{B) 



(log log 14 -ir [1-B- 



(C4) 



U]). 

(C6) 

This is the same expression as Eq. (|C3ll . 

We have encountered the KL divergence in section IVTT1 
where we used it as a zeroth order approximation to the 
evidence. In general, it is not rotationally invariant. But 
although we cannot use it directly, we can define a dis- 
tance between two topologies if their correlation matrices 
are aligned along the same symmetry axes. Dkl{A\\B) 
corresponds then to the maximal signal that we can ex- 
pect. The base of the logarithm that we use corresponds 
to a choice of units - in information theory the conven- 
tional choice is base 2, corresponding to bits. We quote 
the numerical results to base 10, as it makes it easy to 
interpret the result: if Dkl{A\\B) — 3 then we can (at 
best!) expect to distinguish the topologies at the 1000:1 
level. If D KL (A\\B) < 2 then it will be very difficult to 
distinguish the two topologies. Of course the Kullback- 
Leibler divergence depends also on the resolution, £ max . 

When comparing to results quoted as number of stan- 
dard deviations, we use that for a Gaussian random vari- 
able 

P(\x\ > vo) = 1 - J- / e- x2/2 dx = 1 - erf(i//v / 2) 

(C7) 
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For i > 1 we can well approximate 1 — erf(x) by 
exp(— x 2 )/(y/Trx). In figurelTHwe plot log 10 (P(|a;| > va)) 
against va to make it easy to compare the two quantities. 




5 10 15 20 

n a 

FIG. 14: The probability that a Gaussian random variable is 
more than n standard deviations away from the mean. This 
figure helps to compare the results expressed in number of a 
with those expressed as log 10 (P). 




* [1/HJ m „=8,12,16) 



FIG. 15: The Kullback-Leibler divergence 

Akx(1||T[15, 15, X]) between an infinite universe and a 
slab-space, as a function of the size of the smallest dimension 
X. We show curves for ^ max = 8, 12 and 16. For X > 3 
the topology becomes difficult to detect and for X > 6 it is 
basically impossible for any £ max . Compare with Fig. [S] 



3. Comparing different templates 



2. Information theoretical limits on detecting a 
topology 

As we have already mention often, a FLRW universe 
with the trivial topology is homogeneous and isotropic. 
Correspondingly its correlation matrix is rotationally in- 
variant. In this special case also the KL divergence does 
not depend on the relative orientation of the two uni- 
verses. The quantity Dkl measures therefore di- 
rectly how much "information" separates the universe 
with the topology described by A from an infinitely large 
universe. If there is not enough information, then we will 
never be able to detect that topology. 

Figure 1151 shows the KL divergence between an in- 
finite universe and a T[15,15,X] topology for different 
X = L/Hq and ^ max - We see that the distance falls 
rapidly for L > 6/Hq- Even increasing £ max does not 
help as the correlation matrices become essentially iden- 
tical. Hence, even though we can still detect correlations 
in spite of this universe being larger than the particle 
horizon in all directions, we will not be able to distin- 
guish it from an infinite universe at a significant level. 



If the topology of the universe is non-trivial then we 
will end up using different correlation matrices until one 
fits. If a template is completely wrong we expect to see 
no signal at all. However, if the template belongs to a 
topology which is "similar" to the real one, then we may 
find a reduced signal. 

What does similar mean in this context? As an exam- 
ple, let's assume that either the universe has a T[2,2,2] 
topology while we test with T [4,4,4] or the opposite. In 
the first case, the signal is actually too strong, and we end 
up finding a correlation of order unity (A = 0.91 ± 0.05), 
but we pay the price of too much noise. If we had used the 
T[2,2,2] template, our detection would have been more 
significant. On the other hand, if we use the T[2,2,2] 
template for a T [4,4,4] universe then the correlation is 
smaller (A = 0.11±0.02) while the (non-maximised) value 
for infinite universes is A = ± 0.02. Overall, it seems 
better to test first the largest universe that can still be 
distinguished from an infinite one. 

This is also borne out by the Kullback-Leibler 
divergence between T[4,4,4] and T[2,2,2], shown in 
Fig. We find for example with £ max = 

16 that £> KL (T[4,4,4]||T[2,2,2]) » 2000 while 
£>kx(T[2,2,2]||T[4,4,4]) = 265. Both are smaller 
than Dkl(M\T[2,2,2]) and the latter is smaller than 
-Dkl(1||7 1 [4, 4,4]), indicating that it is possible to detect 
a T[2, 2,2] universe with a T[4,4,4] template. Another 
possible use of the Kullback-Leibler divergence is there- 
fore to map out the space of topologies and to identify 
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FIG. 16: The scaling of the Kullback-Leibler divergence 
as function of £ ma x- The curves show Dkl (1| |T[2, 2, 2]) 
(blue) and £>k-l(1||T'[4, 4, 4]) (red). Both keep increasing 
for the whole range of ^ max considered, showing that there 
is information on these topologies even at relatively small 
scales. We also plot D KL (T[A, 4, A]\\T[2, 2, 2]) (cyan) and 
Dkl(T[2,2,2]||T[4,4,4]). We argue that the smallness of the 
latter curve shows that it is possible to detect a T[2,2,2] uni- 
verse with a T[4,4,4] template. 
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